Goto

Collaborating Authors

 risk difference


Investigating Trade-offs in Utility, Fairness and Differential Privacy in Neural Networks

Pannekoek, Marlotte, Spigler, Giacomo

arXiv.org Artificial Intelligence

To enable an ethical and legal use of machine learning algorithms, they must both be fair and protect the privacy of those whose data are being used. However, implementing privacy and fairness constraints might come at the cost of utility (Jayaraman & Evans, 2019; Gong et al., 2020). This paper investigates the privacy-utility-fairness trade-off in neural networks by comparing a Simple (S-NN), a Fair (F-NN), a Differentially Private (DP-NN), and a Differentially Private and Fair Neural Network (DPF-NN) to evaluate differences in performance on metrics for privacy (epsilon, delta), fairness (risk difference), and utility (accuracy). In the scenario with the highest considered privacy guarantees (epsilon = 0.1, delta = 0.00001), the DPF-NN was found to achieve better risk difference than all the other neural networks with only a marginally lower accuracy than the S-NN and DP-NN. This model is considered fair as it achieved a risk difference below the strict (0.05) and lenient (0.1) thresholds. However, while the accuracy of the proposed model improved on previous work from Xu, Yuan and Wu (2019), the risk difference was found to be worse.


Mitigating Bias in Set Selection with Noisy Protected Attributes

Mehrotra, Anay, Celis, L. Elisa

arXiv.org Machine Learning

Subset selection algorithms are ubiquitous in AI-driven applications, including, online recruiting portals and image search engines, so it is imperative that these tools are not discriminatory on the basis of protected attributes such as gender or race. Currently, fair subset selection algorithms assume that the protected attributes are known as part of the dataset. However, attributes may be noisy due to errors during data collection or if they are imputed (as is often the case in real-world settings). While a wide body of work addresses the effect of noise on the performance of machine learning algorithms, its effect on fairness remains largely unexamined. We find that in the presence of noisy protected attributes, in attempting to increase fairness without considering noise, one can, in fact, decrease the fairness of the result! Towards addressing this, we consider an existing noise model in which there is probabilistic information about the protected attributes (e.g.,[19, 32, 56, 44]), and ask is fair selection is possible under noisy conditions? We formulate a ``denoised'' selection problem which functions for a large class of fairness metrics; given the desired fairness goal, the solution to the denoised problem violates the goal by at most a small multiplicative amount with high probability. Although the denoised problem turns out to be NP-hard, we give a linear-programming based approximation algorithm for it. We empirically evaluate our approach on both synthetic and real-world datasets. Our empirical results show that this approach can produce subsets which significantly improve the fairness metrics despite the presence of noisy protected attributes, and, compared to prior noise-oblivious approaches, has better Pareto-tradeoffs between utility and fairness.


Fairness-aware Classification: Criterion, Convexity, and Bounds

Wu, Yongkai, Zhang, Lu, Wu, Xintao

arXiv.org Machine Learning

Fairness-aware classification is receiving increasing attention in the machine learning fields. Recently research proposes to formulate the fairness-aware classification as constrained optimization problems. However, several limitations exist in previous works due to the lack of a theoretical framework for guiding the formulation. In this paper, we propose a general framework for learning fair classifiers which addresses previous limitations. The framework formulates various commonly-used fairness metrics as convex constraints that can be directly incorporated into classic classification models. Within the framework, we propose a constraint-free criterion on the training data which ensures that any classifier learned from the data is fair. We also derive the constraints which ensure that the real fairness metric is satisfied when surrogate functions are used to achieve convexity. Our framework can be used to for formulating fairness-aware classification with fairness guarantee and computational efficiency. The experiments using real-world datasets demonstrate our theoretical results and show the effectiveness of proposed framework and methods.


An Outcome Model Approach to Translating a Randomized Controlled Trial Results to a Target Population

Goldstein, Benjamin A., Phelan, Matthew, Pagidipati, Neha J., Holman, Rury R., Stuart, Michael J. Pencina Elizabeth A

arXiv.org Machine Learning

ACKNOWLEDGMENTS We thank the NAVIGATOR steering committee and investigators for access to the NAVIGATOR data Affiliations: Department of Biostatistics & Bioinformatics, Duke University, Durham, NC (BAG, MJP); Center For Predictive Medicine, Duke Clinical Research Institute, Durham, NC (BAG, MP, NHJ); Department of Medicine, Duke University, Durham, NC (NHJ); Diabetes Trials Unit, Oxford Centre for Diabetes, Endocrinology, and Metabolism, University of Oxford, Oxford (RRH); Department of Biostatistics, Johns Hopkins University, Baltimore, MD (EAS) Funding: This work was supported by National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) career development award K25 DK097279 (B.A.G.), US Department of Education Institute of Education Sciences Grant R305D150003 (EAS). The project described was supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), through Grant Award Number UL1TR001117 at Duke University. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. NAVIGATOR was funded by Novartis. An Outcome Model Approach to Translating a Randomized Controlled Trial Results to a Target Population Abstract Participants enrolled into randomized controlled trials (RCTs) often do not reflect real-world populations. Previous research in how best to translate RCT results to target populations has focused on weighting RCT data to look like the target data. Simulation work, however, has suggested that an outcome model approach may be preferable. Here we describe such an approach using source data from the 2x2 factorial NAVIGATOR trial which evaluated the impact of valsartan and nateglinide on cardiovascular outcomes and new-onset diabetes in a "pre-diabetic" population. Our target data consisted of people with "pre-diabetes" serviced at our institution. We used Random Survival Forests to develop separate outcome models for each of the 4 treatments, estimating the 5-year risk difference for progression to diabetes and estimated the treatment effect in our local patient populations, as well as subpopulations, and the results compared to the traditional weighting approach.


FairGAN: Fairness-aware Generative Adversarial Networks

Xu, Depeng, Yuan, Shuhan, Zhang, Lu, Wu, Xintao

arXiv.org Machine Learning

Fairness-aware learning is increasingly important in data mining. Discrimination prevention aims to prevent discrimination in the training data before it is used to conduct predictive analysis. In this paper, we focus on fair data generation that ensures the generated data is discrimination free. Inspired by generative adversarial networks (GAN), we present fairness-aware generative adversarial networks, called FairGAN, which are able to learn a generator producing fair data and also preserving good data utility. Compared with the naive fair data generation models, FairGAN further ensures the classifiers which are trained on generated data can achieve fair classification on real data.


You'll have to figure this one out for yourselves.

#artificialintelligence

Estimated differences: Adjusted mortality: 11.07% Regarding the number of regression parameters: Not explicitly listed, but by the following paragraph, I would suspect there are at least hundreds of regression parameters (such as an indicator of medical of school attended). "We accounted for patient characteristics, physician characteristics, and hospital fixed effects. Patient characteristics included patient age in 5-year increments (the oldest group was categorized as 95 years), sex, race/ethnicity (non-Hispanic white, non-Hispanic black, Hispanic, and other), primary diagnosis (Medicare Severity Diagnosis Related Group), 27 coexisting conditions (determined using the Elixhauser comorbidity index28), median annual household income estimated from residential zip codes (in deciles), an indicator variable for Medicaid coverage, and indicator variables for year. Physician characteristics included physician age in 5-year increments (the oldest group was categorized as 70 years), indicator variables for the medical schools from which the physicians graduated, and type of medical training (ie, allopathic vs osteopathic29 training)."